AITopics | production rule

Thereason thatthemethod MSO in"Efficient multi-objectivemolecular optimization inacontinuous3 latent space" achieved ahigher penalized logP with unlimited property evaluations than ours (26.1 vs 15.18) isdue4 to different experimental settings. With a8 largerLmax, the best penalized logP score can be significantly increased. Wehavestarted11 running the experiments on GuacaMol as suggested. We will fix these two figures in the final version. All generated molecules in the appendix have been24 double-checked by both RDkit and human experts.

artificial intelligence, machine learning, reviewer, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Towards a theory of how the structure of language is acquired by deep neural networks

Neural Information Processing SystemsFeb-16-2026, 20:04:54 GMT

How much data is required to learn the structure of a language via next-token prediction?

artificial intelligence, correlation, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep networks learn to parse uniform-depth context-free languages from local statistics

Parley, Jack T., Cagnetta, Francesco, Wyart, Matthieu

arXiv.org Machine LearningFeb-10-2026

Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown. Probabilistic context-free grammars (PCFGs) provide a tractable testbed for studying these questions. However, prior work has focused either on the post-hoc characterization of the parsing-like algorithms used by trained networks; or on the learnability of PCFGs with fixed syntax, where parsing is unnecessary. Here, we (i) introduce a tunable class of PCFGs in which both the degree of ambiguity and the correlation structure across scales can be controlled; (ii) provide a learning mechanism -- an inference algorithm inspired by the structure of deep convolutional networks -- that links learnability and sample complexity to specific language statistics; and (iii) validate our predictions empirically across deep convolutional and transformer-based architectures. Overall, we propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2602.06065

Country: